3. Main questions
Question 1: Does the cell doubling time correlate with reduced drug sensitivity?
As we mentioned in our presentation, we want to create a model to predict GI50-values thus to predict, if Lapatinib is a good choice. Therefore, we took several data and performed 3 regression analyzes.
Linear regression ~ Doublingtime
The first linear model trys to predict the G-50 value under the data of the doubling time.
Linear regression ~ Foldchange-means
The second linear model trys to predict the G-50 value under the data of the Foldchange-means.
Multiple regression
As a last part, we did a multiple regression with both datasets to predict GI50-values.
Table of conclusion
The following table shows important values.
| Foldchange | Doublingtime | Multiple regression | |
|---|---|---|---|
| R-squared-value | 0.0769593 | 0.0504364 | 0.1025132 |
| F-statistic | 3.5017851 | 2.4433075 | 2.3415627 |
| RMSE of test-dataset | 0.8256582 | 0.4511192 | 0.6881792 |
The data are hard to discuss, because we use random values for our test and training dataset. Hence you get different values on every run. We could use “selected values” (First 20) but this would be a kind of cheating.
Question 2: Does Lapatinib have a similar effect on lung cancer as Erlotinib?
As we read in the paper “Antitumor and antiangiogenic effect of the dual EGFR and HER-2 tyrosine kinase inhibitor lapatinib in a lung cancer model. (Diaz et al., 2010)”, lapatinib and erlotinib are said to have similar effects. To verify this, we first looked at the correlation of the drugsand then created a graph showing the difference between GI50 for a certain cell line and the middle GI50 for both drugs.
Matrix of correlations
Erlotinib and lapatinib have a strong correlation.
Plot Erlotinib and Lapatinib, coloured by tissue
The difference from the NegLogGI50 for a particular cell line and the mean NegLogGI50 is plotted here for Erlotinib and Lapatinib.
## [1] 0.6528188
A Pearson correlation coefficient of ~ 0.65 confirms that these patterns are similar. One reason for this is that both erlotinib and lapatinib are EGFR inhibitors. Cell lines that were more sensitive are displayed as bars that project to the right of the mean. Cell lines that were less sensitive are displayed with bars projected to the left.
Lung cellines
To answer our more specific question, whether lapatinib also has an effect on lung cancer, we will only look at the cell lines in lung cancer.
The difference from the NegLogGI50 for cell line from lung tissue and the mean NegLogGI50 is plotted here for Erlotinib and Lapatinib.
## [1] 0.9609488
A pearson correlation coefficent of ~ 0.96 suggests that Lapatinib has a similar effect on lung cancer as Erlotinib.
Anova
selection of Lapatinib and Erlotinib treated cells
lapa<-data.frame(Metadata[which(Metadata[,'drug'] == "lapatinib"), ])
erlo<-data.frame(Metadata[which(Metadata[,'drug'] == "erlotinib"), ])
el<-right_join(lapa,erlo, by="cell")
rmv.rows = apply(el, 1, function(x) {
sum(is.na(x))
}) # Go through each row and sum up all missing values
row.names(rmv.rows)
Create data frame with lapatinib and erlotinib data
fc<-data.frame(Treated-Untreated)
dim(fc)
## [1] 13299 819
all<-data.frame(fc[grep("lapatinib|erlotinib", colnames(fc))])
dim(all)
## [1] 13299 113
since erlotinip contains more columns than lapatinib, we have to remove these columns
all.rmv<-data.frame(all %>% select(
-"CCRF.CEM_erlotinib_10000nM_24h",
-"HL.60_erlotinib_10000nM_24h",
-"HT29_erlotinib_10000nM_24h",
-"K.562_erlotinib_10000nM_24h",
-"LOX_erlotinib_10000nM_24h",
-"SR_erlotinib_10000nM_24h",
-"COLO.205_lapatinib_10000nM_24h"))
dim(all.rmv)
## [1] 13299 106
Checking the rows
la<-data.frame(all.rmv[grep("lapatinib", colnames(all.rmv))])
ncol(la)
## [1] 53
er<-data.frame(all.rmv[grep("erlotinib", colnames(all.rmv))])
ncol(er)
## [1] 53
erla<-data.frame(er,la)
ncol(all.rmv) #to prove if the columns are removed
## [1] 106
Anova
drug<-c(rep('Erlotinib',53), rep('Lapatinib',53))
expression_drug<-apply(erla, MARGIN = 2, sum)
df_drug<-data.frame(expression_drug, drug)
ggboxplot (data = df_drug, x="drug", y="expression_drug", color = "drug",
add = "jitter", legend = "none")+
rotate_x_text(angle = 45)+
stat_compare_means(method = "anova")+ # Add global annova p-value
stat_compare_means(label = "p.signif", method = "t.test",
ref.group = ".all.", hide.ns = TRUE) # Pairwise comparison against all
Question 3: Comparing lapatinib treated breast and cns celllines
L_fc <- select(Fold_Change, contains("Lapa"))
L_fc <- as.data.frame(t(L_fc))
rownames(Metadata) <- Metadata$sample
L_treated <- select(Treated, contains("Lapa"))
L_treated <- t(L_treated)
L_untreated <- select(Untreated, contains("Lapa"))
L_untreated <- t(L_untreated)
# selecting breast Lapatinib samples
breast <- Metadata[Metadata[,'tissue']=="Breast",]
rownames(breast) <- breast$sample
rownames(breast) <- gsub(x = rownames(breast), pattern = "-", replacement = ".")
breastFC <- subset(L_fc, rownames(L_fc) %in% rownames(breast))
breastTreated <- subset(L_treated, rownames(L_treated) %in% rownames(breast))
breastUntreated <- subset(L_untreated, rownames(L_untreated) %in% rownames(breast))#
# selecting CNS Lapatinib samples
cns <- Metadata[Metadata[,'tissue']=="CNS",]
rownames(cns) <- cns$sample
rownames(cns) <- gsub(x = rownames(cns), pattern = "-", replacement = ".")
cnsFC <- subset(L_fc, rownames(L_fc) %in% rownames(cns))
cnsTreated <- subset(L_treated, rownames(L_treated) %in% rownames(cns))
cnsUntreated <- subset(L_untreated, rownames(L_untreated) %in% rownames(cns))
#performing a paired t-test of treated and untreated samples
t_test_cns <- col_t_paired(cnsTreated, cnsUntreated, alternative = "two.sided", mu = 0,conf.level = 0.95)
t_test_breast <- col_t_paired(breastTreated, breastUntreated, alternative = "two.sided", mu = 0,conf.level = 0.95)
#obtaining Benjamini-Hochberg adjusted p-values
pval_cns <- t_test_cns$pvalue
pval_breast <- t_test_breast$pvalue
fdr_cns <- p.adjust(pval_cns, "BH")
fdr_breast <- p.adjust(pval_breast, "BH")
#obtaining mean FC values over all samples
breastFCm <- as.numeric(colMeans(breastFC))
cnsFCm <- as.numeric(colMeans(cnsFC))
genes <- colnames(breastFC)
## breast volcano plot
#creating a matrix containg all needed values for plotting
diff_df_breast <- data.frame(gene = genes, Fold = breastFCm, FDR = fdr_breast)
diff_df_breast$absFold <- abs(diff_df_breast$Fold)
head(diff_df_breast)
## gene Fold FDR absFold
## 1 A1CF 0.037268413 0.8765540 0.037268413
## 2 A2M -0.032213825 0.7188608 0.032213825
## 3 A4GALT 0.006012452 0.9793436 0.006012452
## 4 A4GNT -0.053969518 0.4235638 0.053969518
## 5 AAAS 0.081656784 0.5283372 0.081656784
## 6 AACS 0.023767096 0.8115022 0.023767096
# add a grouping column; default value is "not significant"
diff_df_breast$group <- "NotSignificant"
# change the grouping for the entries with significance but not a large enough Fold change
diff_df_breast[which(diff_df_breast['FDR'] < 0.5 & (diff_df_breast['absFold']) < 0.2 ),"group"] <- "Significant"
# change the grouping for the entries a large enough Fold change but not a low enough p value
diff_df_breast[which(diff_df_breast['FDR'] > 0.5 & (diff_df_breast['absFold']) > 0.2 ),"group"] <- "FoldChange"
# change the grouping for the entries with both significance and large enough fold change
diff_df_breast[which(diff_df_breast['FDR'] < 0.5 & (diff_df_breast['absFold']) > 0.2 ),"group"] <- "Significant&FoldChange"
# Find and label the top peaks.
top_peaks_breast <- diff_df_breast[with(diff_df_breast, order(Fold, FDR)),][1:10,]
top_peaks_breast <- rbind(top_peaks_breast, diff_df_breast[with(diff_df_breast, order(-Fold, FDR)),][1:10,])
# Add gene labels for all of the top genes we found
# creating an empty list, and filling it with entries for each row in the dataframe
# each list entry is another list with named items that will be used
a <- list()
for (i in seq_len(nrow(top_peaks_breast))) {
m <- top_peaks_breast[i, ]
a[[i]] <- list(
x = m[["Fold"]],
y = -log10(m[["FDR"]]),
text = m[["gene"]],
xref = "x",
yref = "y",
showarrow = TRUE,
arrowhead = 0.5,
ax = 20,
ay = -40
)
}
plot_breast <- plot_ly(data = diff_df_breast, x = diff_df_breast$Fold, y = -log10(diff_df_breast$FDR), type = "scatter", text = diff_df_breast$gene, mode = "markers", color = diff_df_breast$group) %>%
layout(title ="Volcano Plot of Lapatinib breast cancer samples",
xaxis = list(title="log2 Fold Change"),
yaxis = list(title="FDR")) %>%
layout(annotations = a)
plot_breast
###thresholds still need to be discussed
## CNS volcano plot
diff_df_cns <- data.frame(gene = genes, Fold = cnsFCm, FDR = fdr_cns)
diff_df_cns$absFold <- abs(diff_df_cns$Fold)
head(diff_df_cns)
## gene Fold FDR absFold
## 1 A1CF 0.066575311 0.5939566 0.066575311
## 2 A2M 0.038348381 0.6873009 0.038348381
## 3 A4GALT 0.000390011 0.9980719 0.000390011
## 4 A4GNT -0.018219799 0.8780106 0.018219799
## 5 AAAS 0.014723327 0.9008420 0.014723327
## 6 AACS 0.003887384 0.9870209 0.003887384
# add a grouping column; default value is "not significant"
diff_df_cns$group <- "NotSignificant"
# change the grouping for the entries with significance but not a large enough Fold change
diff_df_cns[which(diff_df_cns['FDR'] < 0.5 & (diff_df_cns['absFold']) < 0.2 ),"group"] <- "Significant"
# change the grouping for the entries a large enough Fold change but not a low enough p value
diff_df_cns[which(diff_df_cns['FDR'] > 0.5 & (diff_df_cns['absFold']) > 0.2 ),"group"] <- "FoldChange"
# change the grouping for the entries with both significance and large enough fold change
diff_df_cns[which(diff_df_cns['FDR'] < 0.5 & (diff_df_cns['absFold']) > 0.2 ),"group"] <- "Significant&FoldChange"
# Find and label the top peaks..
top_peaks_cns <- diff_df_cns[with(diff_df_cns, order(Fold, FDR)),][1:10,]
top_peaks_cns <- rbind(top_peaks_cns, diff_df_cns[with(diff_df_cns, order(-Fold, FDR)),][1:10,])
a <- list()
for (i in seq_len(nrow(top_peaks_cns))) {
m <- top_peaks_cns[i, ]
a[[i]] <- list(
x = m[["Fold"]],
y = -log10(m[["FDR"]]),
text = m[["gene"]],
xref = "x",
yref = "y",
showarrow = TRUE,
arrowhead = 0.5,
ax = 20,
ay = -40
)
}
plot_cns <- plot_ly(data = diff_df_cns, x = diff_df_cns$Fold, y = -log10(diff_df_cns$FDR),type = "scatter", text = diff_df_cns$gene, mode = "markers", color = diff_df_cns$group) %>%
layout(title ="Volcano Plot of Lapatinib CNS cancer samples",
xaxis = list(title="log2 Fold Change"),
yaxis = list(title="FDR"))%>%
layout(annotations = a)
plot_cns
# selecet top peak genes common in cns and breast tissue
tpb_comparison <- subset(top_peaks_breast, gene %in% top_peaks_cns$gene)
tpc_comparison <- subset(top_peaks_cns, gene %in% top_peaks_breast$gene)
# order common genes alphabetically
tpb_comparison <- tpb_comparison[order(tpb_comparison$gene),]
tpc_comparison <- tpc_comparison[order(tpc_comparison$gene),]
## creating heat map of FCs to compare values
cor_mat <- cbind("breast" = tpb_comparison$Fold, "cns" = tpc_comparison$Fold)
rownames(cor_mat) <- tpb_comparison$gene
data <- read.delim
pheatmap(
mat = cor_mat,
color = magma(10),
border_color = "black",
show_colnames = TRUE,
show_rownames = TRUE,
drop_levels = TRUE,
fontsize = 14,
main = "Comparison:
FC levels of cns and breast top peak genes"
)
#correlation between top peak gene expression patterns in breast and cns tissues treated by lapatinib
cor_mat <- as.data.frame(cor_mat)
dif.FC.BC = data.frame(breast = cor_mat$breast - mean(t(breastFC)), cns = cor_mat$cns - mean(t(cnsFC))) #breast data - mean value, cns data - mean value
dif.FC.BC$gene = rownames(cor_mat)
# PLot
ggplot(dif.FC.BC,aes(x = gene, y = breast)) +
geom_bar(stat = "identity", fill = "skyblue") +
geom_text(aes(label = round(breast, 2)), vjust = -0.5, color = "black", size = 3) +
coord_flip() + labs(title = "Mean graph plot of Fold Change for breast top peak genes")
ggplot(dif.FC.BC,aes(x = gene, y = cns)) +
geom_bar(stat = "identity", fill = "skyblue") +
geom_text(aes(label = round(cns, 2)), vjust = -0.5, color = "black", size = 3) +
coord_flip() + labs(title = "Mean graph plot of Fold Change for CNS top peak genes")
#correlation
cor(cor_mat$breast, cor_mat$cns, method = "pearson")
## [1] 0.921812